Combinatorial Characterization of the Language Recognized by Factor and Suffix Oracles

نویسندگان

Alban Mancheron

Christophe Moan

چکیده

Sequence Analysis requires to elaborate data structures which allow both an efficient storage and use. Among these, we can cite Tries [1], Suffix Automata [1, 2], Suffix Trees [1, 3]. Cyril Allauzen, Maxime Crochemore and Mathieu Raffinot introduced [4, 5, 6] a new data structure, linear on the size of the represented word both in time and space, having the smallest number of states, and allowing to accept at least all the substrings of the represented word. They called such a structure a Factor Oracle. On the basis of this structure, they developed another one having the same properties excepting the accordance of all the suffix of the represented word. They called it Suffix Oracle. The characterization of the language recognized by the Factor/Suffix Oracle of a word is an open problem for which we provide a solution.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Error analysis of factor oracles

Factor oracles [1] constructed from a given text are deterministic acyclic automata accepting all substrings of the text. Factor oracles are more space economical and easy to implement than similar data structures such as suffix tree[6]. There is, however, some drawback; a factor oracle may accept strings not in the text, which we call a error acceptance. In this paper, we charactrize factor or...

متن کامل

Statistical Properties of Factor Oracles

Factor and suffix oracles have been introduced in [1] in order to provide an economic and efficient solution for storing all the factors and suffixes respectively of a given text. Whereas good estimations exist for the size of the factor/suffix oracle in the worst case, no average-case analysis has been done until now. In this paper, we give an estimation of the average size for the factor/suff...

متن کامل

title : Finding Maximal Repeats with Factor Oracles

Factor oracles, built from an input text, are automata similar to suffix automata, and accepting at least all substrings of the input text. In papers [LL00] and [LLA02], factor oracles are used to detect repeats on text. Although repeats found with these methods are not maximal, average error is very low and algorithm runs quite fast. In this paper, we present two ideas to improve accuracy of t...

متن کامل

On the combinatorics of suffix arrays

We prove several combinatorial properties of suffix arrays, including a characterization of suffix arrays through a bijection with a certain well-defined class of permutations. Our approach is based on the characterization of Burrows-Wheeler arrays given in [1], that we apply by reducing suffix sorting to cyclic shift sorting through the use of an additional sentinel symbol. We show that the ch...

متن کامل

Challenges for Discontiguous Phrase Extraction

Suggestions are made as to how phrase extraction algorithms should be adapted to handle gapped phrases. Such variable phrases are useful for many purposes, including the characterization of learner texts. The basic problem is that there is a combinatorial explosion of such phrases. Any reasonable program must start by putting the exponentially many phrases into equivalence classes (Yamamoto and...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Combinatorial Characterization of the Language Recognized by Factor and Suffix Oracles

نویسندگان

چکیده

منابع مشابه

Error analysis of factor oracles

Statistical Properties of Factor Oracles

title : Finding Maximal Repeats with Factor Oracles

On the combinatorics of suffix arrays

Challenges for Discontiguous Phrase Extraction

عنوان ژورنال:

اشتراک گذاری